{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import jax.numpy as np\n",
    "from jax import jit\n",
    "import numpy.random as npr\n",
    "import matplotlib.pyplot as plt\n",
    "from ipywidgets import interact, FloatSlider\n",
    "\n",
    "%load_ext autoreload\n",
    "%autoreload 2\n",
    "%matplotlib inline\n",
    "%config InlineBackend.figure_format = 'retina'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Linear Regression\n",
    "\n",
    "Linear regression is foundational to deep learning. It should be a model that everybody has been exposed to before. However, it is important for us to go through this with a view to how we connect linear regression to the neural diagrams that are shown."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Discussion\n",
    "\n",
    "- In our machine learning toolkit, where do we use linear regression? \n",
    "- What are its advantages? \n",
    "- What are its disadvantages?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Equation Form\n",
    "\n",
    "Linear regression, as a model, is expressed as follows:\n",
    "\n",
    "$$y = wx + b$$\n",
    "\n",
    "Here:\n",
    "\n",
    "- The **model** is the equation, $y = wx + b$.\n",
    "- $y$ is the output data.\n",
    "- $x$ is our input data.\n",
    "- $w$ is a slope parameter.\n",
    "- $b$ is our intercept parameter.\n",
    "- Implicit in the model is the fact that we have transformed $y$ by another function, the \"identity\" function, $f(x) = x$.\n",
    "\n",
    "In this model, $y$ and $x$ are, in a sense, \"fixed\", because this is the data that we have obtained. On the other hand, $w$ and $b$ are the parameters of interest, and *we are interested in **learning** the parameter values for $w$ and $b$ that let our model best explain the data*.\n",
    "\n",
    "I will reveal the punchline early:\n",
    "\n",
    "> The **learning** in deep learning is about figuring out parameter values for a given model."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Make Simulated Data\n",
    "\n",
    "To explore this idea in a bit more depth as applied to a linear regression model, let us start by making some simulated data with a bit of injected noise.\n",
    "\n",
    "You can specify a true $w$ and a true $b$ as you wish, or you can just follow along."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = np.linspace(-5, 5, 100)\n",
    "w_true = _____  # exercise: specify ground truth w.\n",
    "b_true = _____  # exercise: specify ground truth b.\n",
    "\n",
    "def noise(n):\n",
    "    return npr.normal(size=(n))\n",
    "\n",
    "# exercise: write the linear equation down.\n",
    "y = _____\n",
    "\n",
    "# Plot ground truth data\n",
    "plt.scatter(x, y)\n",
    "plt.xlabel('x')\n",
    "plt.ylabel('y')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise\n",
    "\n",
    "Now, let's plot what would be a very bad estimate of $w$ and $b$. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot a very bad estimate\n",
    "w = _____  # exercise: fill in a bad value for w\n",
    "b = _____   # exercise: fill in a bad value for b\n",
    "y_est = _____  # exercise: fill in the equation.\n",
    "plt.plot(x, y_est, color='red', label='bad model')\n",
    "plt.scatter(x, y, label='data')\n",
    "plt.xlabel('x')\n",
    "plt.ylabel('y')\n",
    "plt.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loss Function\n",
    "\n",
    "How bad is our model? We can quantify this by looking at a metric called the \"mean squared error\". The mean squared error is defined as \"the average of the sum of squared errors\".\n",
    "\n",
    "We calculate it this way. Firstly, we calculate the **residual** $d$, which is the difference between the **model prediction** $\\hat{y_i}$ for a given sample $i$ and its true value $y_i$:\n",
    "\n",
    "$$\n",
    "d_i = \\hat{y_i} - y_i\n",
    "$$\n",
    "\n",
    "Then we take the sum of square of residuals $r$ over all samples:\n",
    "\n",
    "$$\n",
    "r = \\sum_i^n{d^2_i}\n",
    "$$\n",
    "\n",
    "Finally, we divide it by the number of sample $n$:\n",
    "\n",
    "$$\n",
    "\\text{MSE} = \n",
    "\\frac{r}{n} = \n",
    "\\frac{\n",
    "    \\sum_i^n{\n",
    "        d^2_i\n",
    "    }\n",
    "}{n} = \n",
    "\\frac{\n",
    "    \\sum_i^n{\n",
    "        (\n",
    "            {\n",
    "                \\hat{y_i} - y_i\n",
    "            }\n",
    "        )^2\n",
    "    }\n",
    "}{n}\n",
    "$$\n",
    "\n",
    "\"Mean squared error\" is but one of many **loss functions** that are available in deep learning frameworks. It is commonly used for regression tasks.\n",
    "\n",
    "Loss functions are designed to quantify how bad our model is in predicting the data.\n",
    "\n",
    "### Exercise\n",
    "\n",
    "Let's implement the mean squared error function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exercise: implement mean squared error function in NumPy code.\n",
    "# It should take in y_true and y_pred as arguments.\n",
    "def mse(y_true, y_pred):\n",
    "    return _____\n",
    "\n",
    "# Calculate the mean squared error between \n",
    "print(mse(y, y_est))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Activity: Optimize model by hand.\n",
    "\n",
    "Now, we're going to optimize this model by hand. Use the sliders provided to adjust the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "@interact(\n",
    "    w=FloatSlider(min=-10, max=10, step=0.1), \n",
    "    b=FloatSlider(min=10, max=30, step=0.1)\n",
    ")\n",
    "def optimize_plot(w, b):\n",
    "    y_est = x * w + b\n",
    "    plt.scatter(x, y, alpha=0.3, label='data')\n",
    "    plt.plot(x, y_est, color='red', label='model')\n",
    "    plt.xlabel('x')\n",
    "    plt.ylabel('y')\n",
    "    plt.legend()\n",
    "    plt.title(f'MSE: {mse(y, y_est):.02f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Discussion\n",
    "\n",
    "As you were optimizing the model, what did you notice about the MSE score?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Detour: Gradient-Based Optimization\n",
    "\n",
    "Implicit in what you were doing was something we formally call \"gradient-based optimization\". This is a very important point to understand. If you get this for a linear model, you will understand how this works for more complex models. Hence, we are going to go into a small crash-course detour on what gradient-based optimization is.\n",
    "\n",
    "## Derivatives\n",
    "\n",
    "At the risk of ticking off mathematicians for a sloppy definition, for this workshop's purposes, a useful way of defining the derivative is:\n",
    "\n",
    "> How much our output changes as we take a small step on the inputs, taken in the limit of going to very small steps.\n",
    "\n",
    "If we have a function:\n",
    "\n",
    "$$f(w) = w^2 + 3w - 5$$\n",
    "\n",
    "What is the derivative of $f(x)$ with respect to $w$? From first-year undergraduate calculus, we should be able to calculate this:\n",
    "\n",
    "$$f'(w) = 2w + 3$$\n",
    "\n",
    "(We will use the apostrophe marks to indicate derivatives. 1 apostrophe mark means first derivative, 2nd apostrophe mark means 2nd derivative.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Minimizing $f(w)$ Analytically\n",
    "\n",
    "What is the value of $w$ that minimizes $f(w)$? Again, from undergraduate calculus, we know that at a minima of a function (whether it is a global or local), the first derivative will be equal to zero, i.e. $f'(w) = 0$. By taking advantage of this property, we can analytically solve for the value of $w$ at the minima.\n",
    "\n",
    "$$2w + 3 = 0$$\n",
    "\n",
    "Hence, \n",
    "\n",
    "$$w = -\\frac{3}{2} = -1.5$$\n",
    "\n",
    "To check whether the value of $w$ at the place where $f'(w) = 0$ is a minima or maxima, we can use another piece of knowledge from 1st year undergraduate calculus: The sign of the second derivative will tell us whether this is a minima or maxima.\n",
    "\n",
    "- If the second derivative is positive regardless of the value of $w$, then the point is a minima. (Smiley faces are positive!)\n",
    "- If the second derivative is negative regardless of the value of $w$, then the point is a maxima. (Frowning faces are negative!)\n",
    "\n",
    "Hence, \n",
    "\n",
    "$$f''(w) = 2$$\n",
    "\n",
    "We can see that $f''(w) > 0$ for all $w$, hence the stationary point we find is going to be a local minima."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Minimizing $f(w)$ Computationally\n",
    "\n",
    "An alternative way of looking at this is to take advantage of $f'(w)$, the gradient, evaluated at a particular $w$. A known property of the gradient is that if you take steps in the negative direction of the gradient, you will eventually reach a function's minima. If you take small steps in the positive direction of the gradient, you will reach a function's maxima (if it exists).\n",
    "\n",
    "### Exercise\n",
    "\n",
    "Let's implement this using the function $f(w)$, done using NumPy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exercise: Write f(w) as a function.\n",
    "def f(w):\n",
    "    return _____\n",
    "\n",
    "# Exercise: Write df(w) as a function. \n",
    "def df(w):\n",
    "    \"\"\"\n",
    "    The derivative of f with respect to w.\n",
    "    \"\"\"\n",
    "    return _____\n",
    "\n",
    "# Exercise: Pick a number to start w at.\n",
    "w = _____  # start with a float\n",
    "\n",
    "# Now, adjust the value of w 1000 times, taking small steps in the negative direction of the gradient.\n",
    "for i in range(1000):\n",
    "    w = _____    # 0.01 is the size of the step taken.\n",
    "    \n",
    "print(w)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Congratulations, you have just implemented **gradient descent**!\n",
    "\n",
    "Stochastic gradient descent is an **optimization routine**: a way of programming a computer to do optimization for you so that you don't have to do it by hand."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Minimizing $f(w)$ with `jax`\n",
    "\n",
    "`jax` is a Python package for automatically computing gradients; it is known as an \"automatic differentiation\" system. This way, we do not have to specify the gradient function by hand-calculating it; rather, `jax` will know how to automatically take the derivative of a Python function w.r.t. the first argument. With `jax`, our example above is modified in only a slightly different way."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from jax import grad\n",
    "import jax\n",
    "from tqdm import tqdm_notebook as tqdmn\n",
    "\n",
    "def f(w):\n",
    "    return w**2 + 3 * w - 5\n",
    "\n",
    "# This is what changes: we use autograd's `grad` function to automatically return a gradient function.\n",
    "df = _____\n",
    "\n",
    "# Exercise: Pick a number to start w at.\n",
    "w = _____\n",
    "\n",
    "# Now, adjust the value of w 1000 times, taking small steps in the negative direction of the gradient.\n",
    "for i in tqdmn(range(1000)):\n",
    "    w = w - df(w) * 0.01  # 0.01 is the size of the step taken.\n",
    "    \n",
    "    \n",
    "print(w)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Back to Optimizing Linear Regression\n",
    "\n",
    "## What are we optimizing?\n",
    "\n",
    "In linear regression, we are minimizing (i.e. optimizing) the loss function w.r.t. the linear regression parameters.\n",
    "\n",
    "**Keep in mind:** The loss function is the parallel to the $f(w)$ polynomial function that we were playing around with above.\n",
    "\n",
    "## Ingredients for \"Optimizing\" a Model\n",
    "\n",
    "At this point, we have learned what the ingredients are for optimizing a model:\n",
    "\n",
    "1. A model, which is a function that maps inputs $x$ to outputs $y$, and its parameters of the model. In our linear regression case, this is $w$ and $b$; usually, in the literature, we call this a **parameter set** $\\theta$.\n",
    "2. Loss function, which tells us how bad our predictions are.\n",
    "3. Optimization routine, which tells the computer how to adjust the parameter values to minimize the loss function.\n",
    "\n",
    "**Keep note:** Because we are optimizing the loss w.r.t. two parameters, finding the $w$ and $b$ coordinates that minimize the loss is like finding the minima of a bowl.\n",
    "\n",
    "The latter point, which is \"how to adjust the parameter values to minimize the loss function\", is the key point to understand here. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Code-Along Exercise\n",
    "\n",
    "We are going to write code to tie this all together. In order help you follow along, I will type this in full, but you will only have to fill-in-the-blanks. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exercise: Define the model\n",
    "def model(p, x):\n",
    "    \"\"\"\n",
    "    :param p: parameters\n",
    "    :param x: data\n",
    "    \"\"\"\n",
    "    return _____  # FITB.\n",
    "\n",
    "# Initialize values of w and b. \n",
    "# We will store the parameter values in a dictionary.\n",
    "# The dict keys are the name of the parameter,\n",
    "# and the dict values are the parameter values.\n",
    "params = dict()\n",
    "params['w'] = _____  # FITB\n",
    "params['b'] = _____  # FITB\n",
    "\n",
    "# Differentiable loss function w.r.t. 1st argument\n",
    "def mseloss(params, x, y):\n",
    "    \"\"\"\n",
    "    :param params: parameters to optimize\n",
    "    :param x: input data\n",
    "    :param y: correct outputs\n",
    "    \"\"\"\n",
    "    y_est = _____     # FITB, model is defined in the global scope\n",
    "    return _____         # FITB\n",
    "\n",
    "dmseloss = grad(mseloss)  # derivative of loss function\n",
    "\n",
    "# Optimization routine\n",
    "losses = []\n",
    "for i in tqdmn(range(10000)):\n",
    "    # Evaluate the gradient at the given params values.\n",
    "    grad_p = _____\n",
    "\n",
    "    # Update the gradient values for each parameter.\n",
    "    for name, value in grad_p.items():\n",
    "        _____\n",
    "    losses.append(mseloss(params, x, y))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let's plot the loss score over time. It should be going downwards."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.plot(losses)\n",
    "plt.xlabel('iteration')\n",
    "plt.ylabel('mse')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pprint import pprint\n",
    "\n",
    "pprint(params)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Recap: Ingredients\n",
    "\n",
    "1. Model specification (\"equations\", e.g. $y = wx + b$) and the parameters of the model to be optimized ($w$ and $b$, or more generally, $\\theta$).\n",
    "2. Loss function: tells us how wrong our model parameters are w.r.t. the data (MSE)\n",
    "3. Optimization routine (for-loop)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Linear Regression, Extended\n",
    "\n",
    "Thus far, we have shown linear regression using only one-dimensional inputs (i.e. one column of data). Linear regression has been extended to higher dimensional inputs, i.e. two or more columns of data.\n",
    "\n",
    "## In Pictures\n",
    "\n",
    "Linear regression can be expressed pictorially, not just in equation form. Here are two ways of visualizing linear regression.\n",
    "\n",
    "### Matrix Form\n",
    "\n",
    "Linear regression in one dimension looks like this:\n",
    "\n",
    "![](./figures/linreg-scalar.png)\n",
    "\n",
    "Linear regression in higher dimensions looks like thhis:\n",
    "\n",
    "![](./figures/linreg-matrices.png)\n",
    "\n",
    "### Neural Diagram\n",
    "\n",
    "We can draw a \"neural diagram\" based on the matrix view, with the implicit \"identity\" function included.\n",
    "\n",
    "![](./figures/linreg-neural.png) \n",
    "\n",
    "The neural diagram is one that we commonly see in the introductions to deep learning. As you can see here, linear regression, when visualized this way, can be conceptually thought of as the baseline model for understanding deep learning. \n",
    "\n",
    "The neural diagram also expresses the \"compute graph\" that transforms inputs to outputs."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Break (10 min.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Extension beyond Linear Regression\n",
    "\n",
    "A key idea from this tutorial is to treat the aforementioned four ingredients as being **modular** components of machine learning. This means that we can swap out the model (and its associated parameters) to fit the problem that we have at hand."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Logistic Regression\n",
    "\n",
    "Logistic regression builds upon linear regression. We use logistic regression to perform **binary classification**, that is, distinguishing between two classes. Typically, we label one of the classes with the integer 0, and the other class with the integer 1."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model in Pictures\n",
    "\n",
    "To help you build intuition, let's visualize logistic regression using pictures again.\n",
    "\n",
    "### Matrix Form\n",
    "\n",
    "![](./figures/logreg-matrices.png)\n",
    "\n",
    "### Neural Diagram\n",
    "\n",
    "![](./figures/logreg-neural.png)\n",
    "\n",
    "## Interactive Activity\n",
    "\n",
    "As should be evident from the pictures, logistic regression builds upon linear regression simply by **changing the activation function from an \"identity\" function to a \"logistic\" function**. In the one-dimensional case, it has the same two parameters as one-dimensional linear regression, $w$ and $b$. Let's use an interactive visualization to visualize how the parameters $w$ and $b$ affect the shape of the curve."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def logistic(x):\n",
    "    return 1 / (1 + np.exp(-x))\n",
    "\n",
    "\n",
    "@interact(\n",
    "    w=FloatSlider(min=-5, max=5, step=0.1, value=1), \n",
    "    b=FloatSlider(min=-5, max=5, step=0.1, value=1)\n",
    ")\n",
    "def plot_logistic(w, b):\n",
    "    x = np.linspace(-10, 10, 1000)\n",
    "    z = w * x + b  # linear transform on x\n",
    "    y = logistic(z)\n",
    "    plt.plot(x, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Discussion\n",
    "\n",
    "1. What do $w$ and $b$ control, respectively? \n",
    "2. Where else do you see logistic-shaped curves?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Make simulated data\n",
    "\n",
    "In this section, we're going to show that we can optimize a logistic regression model using the same set of ingredients. \n",
    "\n",
    "First off, let's start by simulating some data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = np.linspace(-5, 5, 100)\n",
    "w = _____\n",
    "b = _____\n",
    "z = w * x + b + npr.random(size=len(x))\n",
    "y_true = np.round(logistic(z))\n",
    "plt.scatter(x, y_true, alpha=0.3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loss Function\n",
    "\n",
    "How would we quantify how good or bad our model is? In this case, we use the logistic loss function, also known as the binary cross entropy loss function.\n",
    "\n",
    "Expressed in equation form, it looks like this:\n",
    "\n",
    "$$L = -\\sum_i^n (y_i \\log(p_i) + (1-y_i)\\log(1-p_i))$$\n",
    "\n",
    "Here:\n",
    "\n",
    "- $y_i$ is the actual class, namely $1$ or $0$ for each sample $i$.\n",
    "- $p_i$ is the predicted class from the model for each sample $i$.\n",
    "\n",
    "If you're staring at this equation, and thinking that it looks a lot like the Bernoulli distribution log likelihood, you are right!\n",
    "\n",
    "### Discussion\n",
    "\n",
    "Let's think about the loss function for a moment:\n",
    "\n",
    "- What happens to the term $y \\log(p)$ when $y=0$ and $y=1$? What about the $(1-y)\\log(1-p)$ term?\n",
    "- What happens to both terms when $p \\approx 0$ and when $p \\approx 1$ (but still bounded between 0 and 1)?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise\n",
    "\n",
    "Now, let's write down the logistic model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Exercise: Define logistic model\n",
    "def logistic_model(p, x):\n",
    "    \"\"\"\n",
    "    Logistic regression model.\n",
    "    \"\"\"\n",
    "    z = _____\n",
    "    y = _____\n",
    "    return y\n",
    "\n",
    "# Exercise: Define logistic loss function, using flattened parameters\n",
    "def logistic_loss(p, x, y):\n",
    "    preds = _____\n",
    "    return -np.sum(y * np.log(preds) + (1 - y) * np.log(1 - preds))\n",
    "\n",
    "# Exercise: Define gradient of loss function.\n",
    "dlogistic_loss = _____\n",
    "\n",
    "# Exercise: Define parameters' starting points\n",
    "params = dict()\n",
    "params['w'] = _____\n",
    "params['b'] = _____\n",
    "\n",
    "# Exercise: write SGD training loop.\n",
    "losses = []\n",
    "for i in tqdmn(range(5000)): \n",
    "    # Evaluate gradient\n",
    "    grad_p = _____\n",
    "    # Update parameters\n",
    "    for name, value in grad_p.items():\n",
    "        _____\n",
    "    # Keep track of losses\n",
    "    losses.append(logistic_loss(params, x, y_true))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pprint(params)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You'll notice that the values are off from the true value. Why is this so? Partly it's because of the noise that we added, and we also rounded off values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.plot(losses)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.scatter(x, y_true, alpha=0.3)\n",
    "plt.plot(x, logistic_model(params, x), color='red')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise\n",
    "\n",
    "What if we did not round off the values, and did not add noise to the original data? Try re-running the model without those two."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Neural Networks\n",
    "\n",
    "Neural networks are basically very powerful versions of logistic regressions. Like linear and logistic regression, they also take our data and map it to some output, but does so without ever knowing what the true equation form is.\n",
    "\n",
    "That's all a neural network model is: an arbitrarily powerful model.\n",
    "\n",
    "## Feed Forward Neural Networks in Pictures\n",
    "\n",
    "To give you an intuition for this, let's see one example of a deep neural network in pictures.\n",
    "\n",
    "### Matrix diagram\n",
    "\n",
    "As usual, in a matrix diagram.\n",
    "\n",
    "![](./figures/deepnet_regressor-matrices.png)\n",
    "\n",
    "### Neural diagram\n",
    "\n",
    "And for correspondence, let's visualize this in a neural diagram.\n",
    "\n",
    "![](./figures/deepnet_regressor-neural.png)\n",
    "\n",
    "## Discussion\n",
    "\n",
    "- When would we want to use a neural network? When would we not want to use a neural network?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Real Data\n",
    "\n",
    "We are going to try using some real data from the UCI Machine Learning Repository to something related to our work: QSAR modelling!\n",
    "\n",
    "With the dataset below, we want to predict whether a compound is biodegradable based on a series of 41 chemical descriptors.\n",
    "\n",
    "The dataset was taken from: https://archive.ics.uci.edu/ml/datasets/QSAR+biodegradation#. I have also prepared the data such that it is split into X (the predictors) and Y (the class that we are trying to predict), so that you do not have to worry about manipulating pandas DataFrames.\n",
    "\n",
    "Let's read in the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "X = pd.read_csv('../data/biodeg_X.csv', index_col=0)\n",
    "y = pd.read_csv('../data/biodeg_y.csv', index_col=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let's write a neural network model. This neural network model starts with 41 input nodes, has 1 hidden layer with 20 nodes, and 1 output layer with 1 node. Because this is a classification problem, we will use a logistic activation function right at the end."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def noise(size):\n",
    "    return npr.normal(size=size)\n",
    "\n",
    "# Exercise: Initialize parameters. Show how shapes are determined.\n",
    "params = dict()\n",
    "params['w1'] = _____\n",
    "params['b1'] = _____\n",
    "params['w2'] = _____\n",
    "params['b2'] = _____\n",
    "\n",
    "# Exercise: Write model together.\n",
    "def nn_model(p, x):\n",
    "    # \"a1\" is the activation from layer 1\n",
    "    a1 = _____\n",
    "    # \"a2\" is the activation from layer 2.\n",
    "    # Explain why we need logistic at the end.\n",
    "    a2 = _____\n",
    "    return a2\n",
    "\n",
    "# Exercise: Define logistic loss function, using flattened parameters\n",
    "def logistic_loss(p, x, y):\n",
    "    preds = _____\n",
    "    return -np.sum(y * np.log(preds) + (1 - y) * np.log(1 - preds))\n",
    "\n",
    "dlogistic_loss = _____\n",
    "\n",
    "# Exercise: Write training loop.\n",
    "losses = []\n",
    "for i in tqdmn(range(1000)):\n",
    "    grad_p = _____\n",
    "    for k, v in params.items():\n",
    "        params[k] = _____\n",
    "    losses.append(logistic_loss(params, X.values, y.values))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.plot(losses)\n",
    "plt.title(f\"final loss: {losses[-1]:.2f}\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use a confusion matrix to see how \"confused\" a model was. Read more on [Wikipedia](https://en.wikipedia.org/wiki/Confusion_matrix)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.metrics import confusion_matrix\n",
    "\n",
    "y_pred = model(params, X.values)\n",
    "confusion_matrix(y, np.round(y_pred))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import seaborn as sns\n",
    "\n",
    "sns.heatmap(confusion_matrix(y, np.round(y_pred)))\n",
    "plt.xlabel('predicted')\n",
    "plt.ylabel('actual')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Recap\n",
    "\n",
    "Deep learning, and more generally any modelling, has the following ingredients:\n",
    "\n",
    "1. A model and its associated parameters to be optimized.\n",
    "2. A loss function against which we are optimizing parameters.\n",
    "3. An optimization routine.\n",
    "\n",
    "You have seen these three ingredients at play with 3 different models: a linear regression model, a logistic regression model, and a deep feed forward neural network model.\n",
    "\n",
    "## In Pictures\n",
    "\n",
    "![](./figures/infographic.png) \n",
    "\n",
    "## Caveats of this tutorial\n",
    "\n",
    "Deep learning is an active field of research. I have only shown you the basics here. In addition, I have also intentionally omitted certain aspects of machine learning practice, such as \n",
    "\n",
    "- splitting our data into training and testing sets, \n",
    "- performing model selection using cross-validation\n",
    "- tuning hyperparameters, such as trying out optimizers\n",
    "- regularizing the model, using L1/L2 regularization or dropout\n",
    "- etc.\n",
    "\n",
    "# Parting Thoughts\n",
    "\n",
    "Deep learning is nothing more than optimization of a model with a really large number of parameters.\n",
    "\n",
    "In its current state, it is not artificial intelligence. You should not be afraid of it; it is just a really powerful model that maps X to Y."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "Now that you've finished learning what the foundational components of a deep learning are,\n",
    "come check out the next notebook to learn\n",
    "how deep learning framework developers organize these pieces\n",
    "into useful tools that you can use."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "dl-workshop",
   "language": "python",
   "name": "dl-workshop"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
